# SigLIP Vision Encoder
Vit So400m Patch14 Siglip Gap 448.pali Mix
Apache-2.0
A vision-language model based on the SigLIP image encoder, utilizing global average pooling, suitable for multimodal tasks.
Text-to-Image
Transformers

V
timm
15
0
Vit Large Patch16 Siglip 384.webli
Apache-2.0
A vision Transformer model based on SigLIP, containing only the image encoder, using original attention pooling, suitable for image feature extraction tasks.
Image Classification
Transformers

V
timm
64
0
Vit Base Patch16 Siglip 384.webli
Apache-2.0
Vision Transformer model based on SigLIP, containing only the image encoder part, using original attention pooling mechanism
Image Classification
Transformers

V
timm
64
1
Vit So400m Patch14 Siglip 224.webli
Apache-2.0
Vision Transformer model based on SigLIP, containing only the image encoder part, utilizing original attention pooling mechanism
Image Classification
Transformers

V
timm
123
1
Nanollava 1.5
Apache-2.0
nanoLLaVA-1.5 is a vision-language model with under 1 billion parameters, designed specifically for edge devices—compact yet powerful.
Image-to-Text
Transformers English

N
qnguyen3
442
109
Featured Recommended AI Models